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The central mean subspace (CMS) and iterative Hessian trans- 
formation (IHT) have been introduced recently for dimension re- 
duction when the conditional mean is of interest. Suppose that X 
is a vector-valued predictor and y is a scalar response. The ba- 
sic problem is to find a lower-dimensional predictor 77 X such that 
E(Y\X) = E(Y\r]'^X). The CMS defines the inferential object for this 
problem and IHT provides an estimating procedure. Compared with 
other methods, IHT requires fewer assumptions and has been shown 
to perform well when the additional assumptions required by those 
methods fail. In this paper we give an asymptotic analysis of IHT 
and provide stepwise asymptotic hypothesis tests to determine the 
dimension of the CMS, as estimated by IHT. Here, the original IHT 
method has been modified to be invariant under location and scale 
transformations. To provide empirical support for our asymptotic re- 
sults, we will present a series of simulation studies. These agree well 
with the theory. The method is applied to analyze an ozone data set. 

1. Introduction. The basic problem of dimension reduction for regres- 
sion [Li (1991, 1992), Cook and Weisberg (1991) and Cook (1998a)] is to 
find a lower-dimensional predictor that carries all the information relevant 
to the regression. Suppose that X is a p-dimensional predictor and y is a 
scalar response. If there is a. p hy q, q < p, matrix rj such that the q lin- 
ear combinations rj^X fully describe the conditional distribution of Y given 
X, then the subspace spanned by the columns of rj is called a dimension 
reduction subspace. In symbols, if 

Y_LX\ifX, 
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then the column space of 77 is a dimension reduction subspace. Here _1L 
stands for independence, so the statement is that Y is independent of X 
given rj^ X. Any subspace that contains a dimension reduction subspace is 
itself a dimension reduction subspace. Under mild conditions the intersection 
of all dimension reduction subspaces is itself a dimension reduction subspace 
and then is called the central subspace (CS), and written as Sy\x- If CS 
is known, then X can be replaced with Psy^-^X without loss of information 
on the conditional distribution of y|X, where indicates a projection in 
the usual inner product onto the indicated subspace. 

If, as in many regression analyses, the conditional mean E{Y\X) is of 
particular interest, then it is possible, and beneficial, to carry out dimen- 
sion reduction for that purpose. Cook and Li (2002) formulated dimension 
reduction in this context as follows. If there is a p by q matrix 1] such that 

E{Y\X) = E{Y\rfX), 

then the column space of is a dimension reduction subspace for the condi- 
tional mean, and is called a mean subspace. Under mild conditions, the in- 
tersection of all such subspaces is again a mean subspace and then is called 
the central mean subspace (CMS), and written as Se(y\x)- See Li, Cook 
and Chiaromonte (2003) and Yin and Cook (2002) for other developments 
related to the CMS. 

Several benefits accrue from studying the conditional mean E{Y\X) rather 
than all of y I X: 

1. Because Se{y\x) ^ Sy\x-, it ni^-y possible to achieve further reduction 
of dimension. 

2. As in classical estimation, focusing on a smaller inferential object could 
lead to increased accuracy. Here Se{y\x) ^-cts as the "parameter of inter- 
est" and all aspects of the conditional distribution of Y\X not described 
by the conditional mean act as the nuisance parameter. 

3. Study of Se(y\x) leads to a categorization of several existing methods, 
such as ordinary least square estimates (OLS) [Li and Duan (1989)], sliced 
inverse regression (SIR) [Li (1991)], principal Hessian directions (PHD) 
[Li (1992)] and the sliced average variance estimator (SAVE) [Cook and 
Weisberg (1991)]. It thus provides further insight into the dimension re- 
duction problem. 

As demonstrated by Cook and Li (2002), these four methods estimate either 
the CS or the CMS under the first or both of the following two conditions: 

(A) Linearity condition: E{X\PsX) is a linear function of X, 

(B) Constant covariance condition: Var{X\PsX) is a nonrandom matrix, 

where the subspace S is either the CS Sy^x or the CMS Se(y\x)j depending 
on the method. In particular, using S = Se[y\x)i OLS and PHD estimate 
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vectors in the CMS, with OLS requiring condition (A) and PHD requiring 
both conditions. Using S = 5y|x, SIR and SAVE estimate vectors in the 
CS, with SIR requiring condition (A) and SAVE requiring both conditions. 

Condition (A) holds for all subspaces of if the predictor X has an ellip- 
tical distribution [Eaton (1986)], and it holds approximately if dim(5) <^p 
[Hall and Li (1993)]. Condition (B) is more stringent but will be satisfied if 
X has a multivariate normal distribution. It is noteworthy that both con- 
ditions apply to the marginal distribution of the predictors and not to the 
conditional distribution of y|X as is common in regression modeling. Con- 
sequently, we are free to use experimental design, predictor transformations 
or re-weighting [Cook and Nachtsheim (1994)] to induce the conditions as 
necessary without suffering complications when inferring about E(Y\X) or 
Y\X. 

In some practical problems, such as in the recumbent cow data [Clark 
et al. (1987); see also Cook and Li (2002)], there is significant heteroscedas- 
ticity among the predictors. In such cases condition (B) fails and application 
of PHD and SAVE becomes problematic. Also, OLS provides at most one 
vector, and so does SIR if the response Y is binary as it is in the recumbent 
cow data. Hence if the dimension of the CMS or the CS is 2 or more, then 
OLS and SIR will necessarily miss part of the CMS and CS. 

It is in this context that Cook and Li (2002) introduced the method 
of IHT, presenting two versions of IHT that both estimate vectors in the 
CMS. Assuming that X is standardized to have mean zero and covariance 
matrix Ip (the identity matrix of dimension p) , one version uses the response- 
based (y-based) Hessian matrix E({Y — E{Y))XX'^) and the other uses the 
residual-based (r-based) Hessian matrix 

H = E{{Y - E{Y) - E{YX'^)X)XX'^). 

Both versions require only condition (A) but, like PHD, can estimate mul- 
tiple vectors in the CMS. Following the findings of Cook (1998b) on the 
general superiority of r-based PHD over y-based PHD, we use the r-based 
Hessian matrix H in the rest of this article. 

Cook and Li (2002) demonstrated the following fundamental relation, 
which is the basis for IHT. Under condition (A) alone, the CMS is an in- 
variant subspace of the linear transformation H, that is, 

HSe(y\x) ^ ■Se{y\x)- 

It follows that if we know any nonzero vector in the CMS, then we can 
transform it iteratively by the Hessian matrix to bring out other vectors in 
the CMS. An obvious initial vector is the OLS vector, which we know belongs 
to the CMS under condition (A) alone. Thus if we use P to denote the OLS 
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vector, which, assuming X to be standardized, has the form (3 = E{YX), 
then the vectors 

are aU in the CMS. At a certain point, say at H^~^(3 (with k < p), one 
more iteration ceases to bring out a Hnearly independent vector, and all 
subsequent vectors in the sequence must also be linear combinations of the 
first k vectors. 

In brief, under the linearity condition (A) the IHT subspace 

5iHT = Span{/3, H(3, H^f3, = Span{/?, if/3, H^(3, 

is contained in the CMS, 5iht ^ Se(y\x) [Cook and Li (2002)]. The task of 
this paper is to estimate the dimension k of cSiht? the number of linearly 
independent vectors generated by the iterative transformations [5, j = 
0,1, . . . ,p — 1, and thereby obtain an estimator of 5iht- Of course, if we 
know (3 and H, then all we need to do is to check at each step j whether 
the smallest singular value of the matrix {(3, . . . , 13), j = 0, . . . ,p — 1, is 
zero, and stop as soon as it is. At that point, k = j. However, in practice, 
H and f3 are replaced by their sample estimates, say H and f3. Whereas in 
the population sequence {P, Hf3, . . .} the smallest singular value becomes 
zero after a certain point, in the sample sequence HP, . . .} the smallest 
singular value becomes small, rather than zero, after a certain point. So our 
task is to deduce an asymptotic distribution against which we can judge if 
the observed smallest singular values correspond to singular values of in 
the population. 

Procedures for determining the order of a dimension reduction space via 
sequential testing of hypotheses were developed previously for other dimen- 
sion reduction methods. For example, Li (1992) developed a testing proce- 
dure for PHD, and Li (1991) and Schott (1994) developed testing procedures 
for SIR. 

It should be mentioned that, although IHT can bring out multiple vec- 
tors in the CMS, at the present stage we do not have a rigorous set of 
sufficient conditions that guarantees IHT will actually cover the CMS. How- 
ever, based on our numerous experiences with real data, IHT often does well 
in bringing out the full pattern in the conditional mean. Hence in part of the 
subsequent development (i.e., the constrained case) we take the pragmatic 
approach of making coverage a working assumption at the outset. Indeed, 
the issue of coverage is challenging, and to date there has not been a general 
result published in this regard. For this reason a similar working assump- 
tion is typically adopted for the asymptotic development of other methods, 
such as those for SIR and PHD [Li (1991, 1992)]. Or, alternatively, the null 
hypothesis is formulated directly on the rank of the population matrix cor- 
responding to the estimator rather than on the dimension of the CS [Schott 
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(1994)]. In fact, we are inclined to believe that IHT is more comprehensive 
in estimating the CMS than using OLS or PHD alone, because it can pick 
up both monotone and U-shaped trends, so long as it has a nonzero vector 
such as OLS to prime the process. Cook and Li (2002) argued that the span 
of OLS and PHD is a subset of the CMS, and it seems that a combination 
of them should provide a reasonably comprehensive estimator of the CMS. 
IHT can be viewed as one way of combining these elements, without evoking 
the constant covariance condition that is required by PHD. 

The rest of the paper is organized as follows. In Section 2 we introduce 
a version of IHT that is modified slightly from that of Cook and Li (2002) 
for an invariance consideration. We also formulate the hypothesis testing 
problem and establish initial asymptotic expansions. In Section 3 we derive 
the asymptotic distribution under certain assumptions on the predictor X 
and its relation with the response Y . In this case the limiting distribution is a 
chi-squared distribution (Theorem 4). In Section 4 we derive the asymptotic 
distribution when no practically restrictive conditions are placed on X or 
Y (Theorem 5). In Section 5 we discuss the implementation of the tests in 
both cases. We check our theoretical conclusions against simulated results 
in Section 6, and we apply both procedures to analyze an ozone data set. 

2. Foundations. 

2.1. Invariant IHT. Let (Xi, li), . . . , (X„, 1^) be n independent copies 
of {X,Y), in which the predictor X is a random vector in W and the re- 
sponse y is a scalar random variable. We assume throughout this article 
that Var(X) is positive definite. As demonstrated in Cook and Li (2002), 
the CMS is invariant under affine transformation of X: For any nonsingular 
phy p matrix A and p-dimensional vector 6, 

Thus, without loss of generality, we prestandardize and use Z = Var(X)~^/^(X — 
E{X)) as the predictor vector so that E{Z) = and Var(Z) = Ip. In what 
follows, we first estimate the standardized subspace Se(y\z) along with its 
dimension, and then transform back to estimate Se(y\x)- 

Let Z and S be the sample mean and sample covariance matrix of Z: 

Z = En{Z) and T, = En{Z -Z){Z -Zf , 

where Enf{Z) stands for 77,"^ ^27=1 fi^i)- Let Z be the standardized Z, 

Z = T.-^/'^{Z -^), 

and let 

H = En{eZZ^}, 
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where e is the observed regression error Y -Y- p^Z with P = En{Z{Y - 
Y )). This matrix was suggested in Cook and Li (2002) as the transformation 
matrix in the r-based IHT method. However, in practice, it is desirable to 
make IHT invariant under affine transformation of both Z and Y; that 
is, conclusions drawn from {Z,Y) should be identical to those drawn from 
(AZ + 6, cY + d) , where A is any p hy p nonsingular matrix, b is any p- 
dimensional vector, c is a nonzero scalar and d is any scalar. For this purpose 
we will replace Y — Y in the above transformation by its standardized version 

Y = a-^{Y -Y), 

where &^ = EniY -Yf, and use the transformation matrix 

H = En{eZZ^}, 

where e = Y — 0^ Z with [3 being the regression estimate En{YZ). For con- 
sistency in the rest of this article, we now redefine H and (3 to be the 
population versions of H and /3, 

H = Ei[{Y - E{Y))/a - fZ]ZZ^), 

where j3 = E{{Y - E{Y))Z) ja. 

2.2. Formulation of hypotheses. Let 

B = {l3,Hp,...,HP-^/3) and B = 0,HP, . . . ,HP-^p). 

We estimate the rank of B, which is equal to the dimension of 5iht since 
^iHT = Span(i?), by conducting a series of hypothesis tests. Let Ai > A2 > 
• • • > Ap be the eigenvalues of BB^ , and consider the sequence of tests 

Hqj : Xj+i = • • • = Ap = 0, j = 0,l,...,p- I. 

The rank A; of is the smallest value of j for which this hypothesis holds. 
Let Ai > A2 > • • • > Ap be the eigenvalues of nBB^ . We test Hqj using the 
statistic 

p 

Tj = C"^ ^ Xi, 

i=j+i 

where C is a positive constant that depends on j and will be determined 
later. Relatively large values of Tj provide evidence against Hqj. Tests of 
Hqj are used to estimate the rank A; of i? as follows: Beginning with j = 0, 
test -ffo,o- If the hypothesis is rejected, increment j by one and test again, 
stopping with the first nonsignificant result. The corresponding value of j is 
the estimate k of k. Procedures of this form are fairly common for estimating 
the rank of a matrix; see, for example, Rao [(1965), page 472]. 
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2.3. Initial asymptotic equivalences. In this section we characterize the 
components of in terms of their asymptoticahy equivalent variables, and 
provide expansions that will be useful when studying the distribution of 
in later sections. 

First, consider the singular value decomposition of B: 

(1) «Mr.r„,(° »)(<), 

where T = (Fi, Fq) and ^ = ^'o) are p by p orthonormal matrices, D is 
a k hy k diagonal matrix with positive diagonal elements, Fi and ^'i have 
dimension p hy k, and Fq and ^'o have dimension p hy p — k. It follows 
from Eaton and Tyler (1994) that the joint asymptotic distribution of the 
p — k smallest singular values of the matrix y/nB is the same as that of the 
singular values of the matrix ■y/nF^(i? — i?)^o- Therefore the asymptotic 
distribution of CT^ is the same as that of 

n{vec[F^(5 - B)^o]f Yec{TUB - B)^o], 

where vec is the usual operator that maps a matrix to a vector by stacking its 
columns: if ^ is a matrix with columns ai, . . . , a^, then vec(^) = {aj ■ ■ ■a^)'^ . 
Thus determining the asymptotic distribution of boils down to computing 
the asymptotic distribution of 

The estimate i? is a function of (3 and H, both of which are essentially 
(though not exactly) sums of independent and identically distributed (i.i.d.) 
random variables. So the key is to expand 

VHvec[F^(^-5)^o] 

as a function 

of sums of i.i.d. random variables. 

First, expand H^j3 so that the remainder is of the order Op(n~^), i = 
1, ... ,p — 1. Starting with 

(2) W^P = {H + {H-H)y{(3 + 0-(3)}, 

the term {H + {H — H)Y can be expanded as the sum of 2* terms, each being 
of the form Gi • • • Gj, where the G's can be either H or H — H. However, 
those Gi - ■ -Gi terms involving two or more H — H are of the order Op{n~^) 
or smaller and can be dropped. For the terms involving only one H — H, the 
i — 1 H^s appear either on the left, or right, or both sides, oi H — H . In other 
words they can be expressed as H^{H — H)H'^~^~^ , where j = 0, . . . ,i — 1. 
Hence we have the following expansion: 

i-l 

{H + {H- H)y =H' + Y, H^{H - H)W~^~^ + Op{n~^). 

j=0 

Substitute this expansion into (2) to obtain 

j=0 
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(3) 

i = l,...,p-l. 

We next further expand (3 — (3 and H — H as functions of sums of i.i.d. 
random variables. This is given in the next lemma; its proof is provided in 
the Appendix. 

Lemma 1. Under regularity conditions we have the following expansions: 
En{ZY)-(3-\En{ZZ'' -Ip)f3 
-\En{Y^ -l)(5 + Op{n-^), 
En{e{ZZ^ - Ip) -H}- \En{ZZ^ - Ip)H 
- \HEn{ZZ^ - Ip) - lEn{Y^ - l)H + Op(n-i). 

Here, the "regularity conditions" refer to those under which the central limit 
theorem applies to averages of i.i.d. random variables, which in our case are 
guaranteed if Z has finite fourth moments. 

In the next section we derive the asymptotic distribution of under a set 
of constraints on X and Y . These constraints are similar to those imposed 
on SIR and PHD to produce chi-squared asymptotic distributions [Li (1991, 
1992), Cook (1998b) and Bura and Cook (2001)]. We refer to this case as 
the constrained case. We then derive the asymptotic distribution without 
these conditions. While the results for the general case can be applied to the 
constrained case, the latter takes advantage of the structures imposed and 
performs better if the constraints are satisfied. It also has a simple form of 
a chi-squared distribution which is easy to use. 

3. Asymptotic distribution for constrained case. 

3.1. Constraints. In the constrained case we assume that: 

(CI) The span of the IHT vectors exhausts the CMS, 5iht = Se{y\z)- 
(C2) The predictor Z is normally distributed. 

(C3) E{e^\Z) = E{e'^\Psj,^y^2^Z), where e = Y- f3^Z is the population re- 
gression error. 

These assumptions are similar in spirit to those imposed on the constrained 
cases of PHD and SIR. Under the linearity condition (A) alone, 5iht ^ 
Se{y\z)- III condition (CI) we carry this a step further and assume equal- 
ity. This implies, for example, that if dim{SE(Y\z)) > Oj then we must have 
/3 7^ 0. Condition (C3) says that S^^e'^iz) ^ 'Se{y\z)- That is, Se(y\z) must 



(4) 



(5) 



H-H- 



DIMENSION OF ITERATIVE HESSIAN TRANSFORMATION 9 



be a mean subspace for the regression of on Z. This means that any het- 
eroscedasticity present in the residuals must depend only on directions in 
the CMS. Conditions (C2) and (C3) are used to force a simple chi-squared 
asymptotic distribution for T^. 

Because Z is normal, both the linearity condition (A) and the constant 
covariance condition (B) hold and thus Span(/3,i?) C Se(y\z)- Also, for any 
integer j > 0, P S Span(ii'), implying that 5iht ^ Span(/3,ff). Hence, it 
follows from condition (CI) that 

(6) SmT = Span{(3,H)=SE{Y\z)- 

Because r-based PHD [Li (1992)] is designed to estimate Span(ff), it follows 
that in the constrained case IHT combines r-based PHD with OLS. Cook 
(1998b) found that y-based PHD is not very effective at finding linear trends 
and that the best results in practice are often found by informally combining 
OLS with r-based PHD. IHT is the first formal method for combining OLS 
with r-based PHD, making use of (3 to find linear trends in the mean function 
and H to find curvature. 

We next consider the asymptotic distribution of in the constrained 
case, picking up the general argument at the end of Section 2.3. 

3.2. Expansion of y/nveclTQ {B — i?)^'o]. From the singular value de- 
composition (1) we know that TqB = 0. That is, the columns of Tq are 
orthogonal to the columns of B, and by (6) they are also orthogonal to the 
columns of (/3,-ff) because 5iht = Span(i?). Thus, if we multiply both sides 
of (3) by the matrix Tq from the left, all the terms that begin with an H 
drop, and we have 

T^{H'P - H'P) = VliH - H)H'-^P + Op(n-i). 

It follows that 

V^VliB - B)^o 

(7) = V^T^0 -(3,{H- H)P, ...,{H- H)HP-^P)^o + 0^{n-'l^) 

= V^Tl0 -(3,{H- H)Bo)^o + Op(n-i/2), 

where Bq = {P, . . . , HP~^/3). 

Observe that, in expansions (4) and (5), the terms 

-A -En{Y^-l)(3, H, HEniZZ^ -Ip)/2, En{Y^-l)H/2 

vanish if we multiply them by from the left. Therefore, Tq' (5 — 5)^*0 
reduces to 

r^{En{ZY) - En{W)f3/2, (EnieW) - En{W)H /2)Bo)^o + Op(n-^), 
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where W stands for the matrix ZZ — Ip. Using the relation B = {j3, HBq) 
we can rewrite the above matrix as 

r^{En{ZY),En{eW)Bo)^f) - ir^^„(t^)S*o + Op{n-^). 

Note that the second term drops because B^q = 0. Furthermore, the identity 
matrix Ip in W = ZZ^ — Ip also drops because it is to be multiplied from the 
left by Fg" and from the right by Bq, and the columns of Fq are orthogonal 
to the columns of Bq , which consists of the first p—1 columns of the matrix 
B. To conclude, we have the following expansion for ^/uTqIIS — B)'^q. 

Theorem 1. Under conditions (CI) and (C2) for the constrained case, 
we have 

(8) V^r^{B - B)^o = V^T^{En{ZY),E4eZZ^)Bo)^o + Op{n-^/^). 

The right-hand side of (8) can be further simplified using the properties 
of ^0- We will do this in two separate cases, /? G Span(ff) or /3 ^ Span(//), 
and then synthesize them into a simple and general formula. 

3.3. Case I: /? ^ Span(i/). The next lemma describes the structure of 
^0 in this case, which is the key to the simplification. Its proof is given in 
the Appendix. 

Lemma 2. If (3 ^ Span{H) , then: 

(i) the first row of is a zero vector, and 

(ii) the second row of is not a zero vector. 

To simplify the expansion of y/nTQ{B — B)"^q using this result, rewrite 
the expansion (8) as 

r^{B - B)^o = (^„(F^Zy), E„(eF^ZZ^Bo))^o + Op{n-^) 

(9) 

= En{eTlZZ'"Bo)^o + Op{n~'), 

where is the p—1 hy p — k matrix comprising the second through the 
pth rows of the phy p — k matrix ^q. Furthermore, because the first row of 
^'o is and because B'^q = 0, we have 

{H/3,...,HP-^f3)<i>o = HBo^o = 0. 

In other words, the columns of the matrix Bq^q are orthogonal to the 
columns of H. Consequently, letting Qh = Ip — ^Span(_ff)) 

(10) Bo<^o = QhBo<^o = {QhP, 0, . . . , 0)$o = QhPo^, 
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where Oq is the first row of the matrix <I>o ) which by Lemma 2 is a nonzero 
vector. 

Substitute (10) into the right-hand side of (9) to obtain 

rl{B - B)^o = EnieVlZZ^QHpal) = En{eUV^) + Op(n"i), 
where U = TqZ and V = uqP'^QhZ. Hence 

= V^EnieV [/) + Op(n-i/2). 

Letting the columns of the p hy k matrix 7 be an orthonormal basis for 
5iHT> we see that V = ao{P^Z — f3'^^j'^ Z) is measurable with respect to 
because /? G 5iht- This implies that E{eV U) = 0, as can be seen 
from the following derivation: 

E{eV ^U) = E{E{e\Z)V U) 

= E{E{e\-i^Z)V(^U) 

= E{eE{V^U\-/^Z)) 

= E{eV ^ E(U\-f^ Z)) 

= E{eV)0E{U) = O, 

where, for the second equality we used the definition of the CMS, for the 
fourth we used the measurability of V with respect to j'^ Z, and for the 
fifth we used the independence between U and 'y'^ Z, which follows from the 
normality of Z and the orthogonality between the columns of Tq and the 
columns of 7. 

Hence, by the central limit theorem, ^/riEn{eV U) is asymptotically 
normal with mean and covariance matrix i?{e^(y dS) U){V U)'^}. We 
now simplify this covariance matrix: 

E{e^{V 0U){V^ Uf} = E{e^{VV^ UU^)] 

= E{E{e'^\-f'^Z){VV^ UU'^)} 

= E{e^E{VV^ ® UU^\-i^Z)] 

(11) 

= E{{e^VV'^) ® E{UU^\-f'^Z)} 

= E{e^VV^)(^E{UU^) 

= ^(eVy^)®/p_fc. 

For the second equality we have used the assumption E{e^\Z) = £'(e^|7^Z), 
and for the last equality we have used the fact that E{UU'^) = T'qE{ZZ'^)Yq = 
Ip~k- The rest of the equalities follow from the similar argument we used in 
the demonstration of E[eU ®V) = Q. 
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Now substitute the definition V = aQ0^ QhZ into the expression 

Ip—k ■ 

{f3'^QHE{e^ZZ^)QHf3){aoa^ Ip.k) = Eie^^ QuZfia^al ® V^) 

= C(aoao/||aof) ® ^p-fc, 

where C = E{e(3'^ Q h Z)'^\\aQ\\'^ . It is easy to verify that any matrix of the 
form aa^ ® Im, where a is a unit vector, is an idempotent matrix of rank 
m. Therefore, 



V^r^{B - B)^o/VC ^ N{0, R), 

where R is an idempotent matrix of rank p — k. So we have proved the 
following theorem. 

Theorem 2. Let Ai > • • • > Ap be the eigenvalues of the matrix nBB^ . 
Suppose that conditions (C1)-(C3) hold and that (3 ^ Span(ff). Then, under 
the null hypothesis ifo,fc : A^+i = ■ • ■ = Xp = 0, we have 

i=k+l 

where C = E{ef3'^QHZ)'^\\ao\\'^ , ao being the second row of the matrix ^'o- 

3.4. Case II: /3 G Span(i?) . As in the previous case, the special structure 
of the matrix ^'o under /3 S Span(i?) plays a critical role in the simplification 
of the expansion of ^/nTQ{B — B)"^q. This structure is described in the next 
lemma, and proved in the Appendix. 

Lemma 3. If [3 ^ Span(//), then the first row of is not 0. 

Now write 

^0^ 



^0 
$0 

where tq is a vector in and <I>Oi as before, isap— 1 hy p — k matrix. 

Since B^q = 0, we have 

HB^^^ + (3t^ = 

Because (3 G Span(i/), (3 = Hi] for some r] in W. Hence 

HBo^^ + Hr^T^ = H{Bo<^o + Vr^) = 0- 

That is, the columns of the matrix Bq^q + tjtq are orthogonal to the rows 
(and hence columns) of H. Consequently, 

Bo<^o + VT-Q = Qh{Bq^o + Wq)^ 
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where, as before, Qh is the projection matrix onto the orthogonal comple- 
ment of Span(//). Since (3 £ Spaia{H), Span(i?o) ^ Span(ii') and therefore 
QhBq^q = 0. Hence, letting "f" denote the Moore-Penrose generalized in- 
verse, 

Bo<^o = -r]T^ + QhVTo = -{I - Qh)vto 
= -H{HHyHr]T^ = -H{HHy(3T^. 

From the definition of the Moore-Penrose generalized inverse of a symmetric 
matrix it is easy to see that H{HH)^ = Hi Therefore, 

(12) So^o = -H^f^T^- 



^l/2^ 



Rewrite expansion (8) as 

V^rliB - B)^o = V^{En{YU),En{eUV^))^o + Op{n , , 

where U is the (p — A;) -dimensional vector Pg'Z and V is the (p — l)-dimensional 
vector Bq Z. Note that the V here is different from that defined in Sec- 
tion 3.3, but U denotes the same quantity. Since the columns of Bq belong 
to Span(i?), V is measurable with respect to j'^Z, and since the columns 
of Fq are orthogonal to Span(i?), U and j'^Z are independent. Thus, fol- 
lowing the same argument used in Section 3.3 for the demonstration of 
E{eV [/) = 0, we can show that E{YU) = and E{eUV^) = 0. Hence the 
vector 

vec{{En{YU), Er,{eUV^))^o} = V^i^I ® V^) (^^^^^^^[^^ 

is asymptotically multivariate normal of dimension {p — k)'^ with mean 
and variance matrix 

(13)(*o ®^v~^)yE{eY{V®U)U^) E{e\V ®U){V ^Uf)) ^^'® ^^^'^^ 

We next simplify this covariance matrix. 

By an argument similar to (11), we can show that 

E{Y^UU'^) = E{Y^)Ip_k, 

E{e^{V ^U){V(S, Uf) = E{e^VV^) Ip_k- 

To derive E{eYU {V U)'^), note that U = ICiSiU, where 1 is the scalar one. 
Hence 

E{eYU{V ^7)^) = E{eY{l ^U){V0 Uf) = E{eY{V^ UU^)). 
Now apply the argument leading to (11) to obtain 

E{eYU{V ® uf) = E{eYV^) ® Ip.k- 
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Hence the asymptotic variance (13) now becomes 

i^U^o) ® Ip-k where A = ^^^'^^ E{fvV^) ) ' 
Expressing ^'q as {tq,^'q), we can rewrite the matrix '^^A^q as 
A = E{Y^)tot^ + ToE{eYV'^)^o 

(14) 

+ ^lE{eYV)T^ + ^lE{e^VV^)^o. 

Now recah that Bq^q = -H^(3t^. So 

= $^5(f Z = -Top'^H^Z. 

Substitute this relation into (14) to obtain 

A = {E(Y^) - 2E{eYZ^)H^P + [3^ E{e^ Z Z^)H^ I3}tqt^ 

= E{Y - eZ'^H^fjf ToTo^ = Cu^t^ /\\t4\ 

where Ci is the constant E{Y - eZ'^ (5f\\TQf . Note that tqt^ /Wtq^ is 
an idempotent matrix of rank 1, and (Torg'/llrop) ® Ip-k is an idempotent 
matrix of rank p — k. Therefore 

V^r];{B - B)^o/^i ^ N{0, R), 

where R is an idempotent matrix of rank p — k. We have proved the following 
theorem. 

Theorem 3. Suppose that conditions (C1)-(C3) hold and that [3 belongs 
to Span(/f). Then, under the null hypothesis -ffo,fc ■ ^k+i = " • • = '^p = 0, we 
have 

i=k+l 

where Ci = E{Y — eZ"^ P)'^\\tq\\'^ , with Tq being the first row of the matrix 

3.5. Synthesis of the two cases. To apply directly the asymptotic results 
developed in Sections 3.3 and 3.4, one must determine at the outset whether 
f3 belongs to Span(if), which would likely be problematic in practice. In this 
section we derive a general result that synthesizes the two cases. This enables 
us to apply the test in the constrained case without having to know whether 
f3 belongs to Span(iJ) ahead of time. 

Recall from (10) and (12) that 

Bn^n = < 

[-H^PtJ, if /3ESpan(i7). 
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Using this relation we can rewrite the constants C and Ci as 
C = iT{aoE{e^fQHZZ'^QHl3)al] 

(15) 

Ci = tr{To^(y - e/3^if^Z)Vo^} 

(16) 

= tr{^(y2)^o^T ^ 2To^(yeZ^)Bo$o + $W^(e'^^'")5o$o}. 
Now consider the matrix 

(17) ^-^0(^0 B^)[E{YeZ) E{e^ZZT))[o Boj 

If /? ^ Span(if), then the first row of ^'o is and the matrix reduces to 
that inside trace(-) on the right-hand side of (15). If /? G Span(ff), then the 
first row of ^'o is tq and the matrix reduces to that inside trace(-) of (16). 
Thus if we let C2 be tr(74), then it automatically normalizes the asymptotic 
distribution of to a Xp-k distribution in both cases. To further simplify 
the notation, let W denote the {p + 1) -dimensional vector {Y, eZ'^)'^ , and let 
diag(l, Bq) denote the p + 1 x p block-diagonal matrix with diagonal blocks 
1 (the scalar one) and Bq. Then we can express C2 as 

(18) C2 = tT{^^ diag(l, )S( W^) diag(l, Bo)^o}. 
The next theorem summarizes this general result. 

Theorem 4. Suppose that conditions (C1)-(C3) hold. Then, under the 
null hypothesis -ffo.fc '■ ^k+i = ■ ■ ■ = = 0, we have 

i=k+l 

where C2 is defined at (18). 

4. Asymptotic distribution for the general case. We now derive the 
asymptotic distribution of Y^^i=k+i ™ general case. The asymp- 
totic result of this section holds without conditions (C1)-(C3) — in its general 
form the asymptotic distribution is related only to the rank of the matrix B. 
Thus for clarity we will not refer to these conditions in the statement of the 
result. Under the linear conditional mean condition (A), 5iht is a subspace 
of the CMS, and the test helps us to identify a set of significant vectors that 
belongs to the CMS. Under the coverage condition (CI), 5iht is equal to 
the CMS, and the test helps us to identify the CMS itself. The point of this 
generalization is that (C3) is altogether removed, (C2) is replaced by the 
much weaker condition (A), and without (CI) we can still find vectors in 
the CMS but without the guarantee that they will span the CMS. 
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By expansion (3), the leading term of B — B, ignoring the error of mag- 



nitude Op{n is 



i^P - (3, H0 -P) + {H- H)(3, HP~\P - [5) 

+ Y,H^{H -H)HP-'^-^l5\. 
j=o / 

This can be written as the sum of p matrices of simpler structures, as follows: 
(/3 -/?,... , HP-\^P - /?)) + (0, {H - H)P, ...,{H- H)HP-^13) 

+ (0, 0, H{H - H)f3, H{H - H)HP~^p) + ■■■ 

+ {0,...,0,HP~\H-H)(3). 
Thus, the vector vec{B — B) can be written as 

+ ••• + 



+ 



\HP~\i3-P)J 
In other words. 



{H - H)(i 
\{H- H)HP-^p J 








(19) 



Yec{B - B) 







H U 



0^ ( 





\HP~^{H -H)(3j 
p-(3 \ 



H ij 



{H-H)P 
\{H - H)HP~'^PJ 



\HP-^ 
= MV. 

Here, M is a p'^ by constant matrix, and y is a random vector consisting 
of subvectors (3 — f3, . . . , {H — H)HP"'^P which, according to Lemma 1, can 
be approximated by sums of independent and identically distributed vec- 
tors. It follows that ^/nMV converges in distribution to a p^-dimensional 
multivariate normal random vector. 

To write an explicit form of the asymptotic distribution, let 

^, = ZY-P- {ZZ^ - /p)/3/2 - _ i)(3/2, 
(20) 6 = {e{ZZ^ -i^)-H- {ZZ^ - Ip)H/2 



H{ZZ' - Ip)/2 - (y^ - 1)H/2}W-'^P 



2,..., p. 
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Then, by Lemma 1, 

(H - H)W-^f3 = + Op(n-i), i = 2, . . . ,p. 

Thus, if we let ^ be the vector (^f , . . . then V = En{^) + Op(n~^), and 

consequently \/nV converges in distribution to a p^-dimensional multivari- 
ate normal with mean and covariance matrix E{£^^'^). Therefore, 

y^vec{r;5^(^ - B)^o} ^ N{0, (^0 Fof ME{^C^)M^ {^fo ro)/C2). 
Thus we have proved the following theorem. 

Theorem 5. In the general case, we have 

p ip-k)^ 

E E 

i=k+l 1=1 

where Ki, . . . , K(^p_j^-^2 are independent chi-squared random variables with 
one degree of freedom and lo, . . . ,uj^p_k^2 are the eigenvalues of the matrix 

(21) (^-0 ® rofME{^C'^)M^(^o ® ro)/C2, 

with C2, M and ^ defined by (18), (19) and (20), respectively. 

5. Implementation. In this section we describe how to estimate the var- 
ious unknown quantities involved in the asymptotic distribution of Tk = 
fc+iAj, for both the constrained and the general cases. In the con- 
strained case we only need to estimate C2, whereas in the general case we 
need also to estimate the coefficients uJi, . . . ,u;(p_^.)2. 

To estimate C2 , recall that Tq and ^'o are derived from the singular value 
decomposition of B. That is, the columns of Fq are the eigenvectors of the 
matrix BB^ corresponding to its zero eigenvalues, and the columns of ^'o 
are the eigenvectors of B^B corresponding to its zero eigenvalues. So, we 
let Tq be the p x p — j matrix whose columns are the p — j eigenvectors of 
1313^ corresponding to its smallest eigenvalues, in a descending order. In a 
similar manner construct ^'o, also of dimension p x p — j, from the matrix 
B^B. Furthermore, we will estimate Bq by its sample version 

Bo = 0,...,HP~^P). 

By the weak law of large numbers B and Bq consistently estimate B and 
Bq, and, under the null hypothesis Hqj, the matrices Fq and ^'o consistently 
estimate Fq and ^'q. We propose to estimate C2 by substituting the estimates 
Fq, ^'o and Bq for their population values Fq, ^'o and Bq in (18): 

C2 = tr{$^ diag(l, B^)En{WW^) diag(l, 5o)$o}, 
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where W is the (p+ l)-dimensional vector (Y ,eZ'^)'^ . By Slutsky's theorem, 
substituting C2 for C2 in Tj = C^^ Sf=j+i will not change its asymptotic 
distribution. 

To estimate wi, . . . , a;(p_;j)2 , let 

ii = ZY-p- {ZZ^ - Ip)p/2 - (y2 _ i)p/2, 

ii = {eiZZ^^ -lp)-H- (ZZP' - Ip)/2 

- [ZZ^ - Ip)H/2 - _ l)H/2}W-^P, i = 2,...,p, 

and let ^ be the p^-dimensional vector {S^f , . . . , ^J)"^. We estimate M in (21) 
by replacing H in the definition of M, as given in (19), by H. The coefficients 
LOi, . . . ,u;(p_fc)2 are then estimated by the eigenvalues of 

($0 ® f o)^Mi?„(eT)M^($o ® f o)/C2. 

As we can see from its construction, the general test does not reduce nu- 
merically to the constrained case when conditions (C1)-(C3) are satisfied, 
though in this case the two asymptotic distributions are first-order equiva- 
lent because cDi , . . . , Cj(^_^y converges in probability to l<ji , . . . , w^^.^p , which 
contain p—k I's and (p — k)'^ — {p — k) O's. Because the test for the con- 
strained case uses this special 0~1 structure of the w's, it is expected to 
outperform the general test when conditions (C1)-(C3) are satisfied. 

6. Simulation results. 

6.1. Test levels. In this section we present selected results from a sim- 
ulation study to investigate the levels of the chi-squared and weighted chi- 
squared tests, one goal being to provide support for the validity of our 
asymptotic results. The tests have the same statistic for the hypothesis 

^ p ^ 
dim(5iHT)=j, Tj = C2^ ^ Aj, 

i=j+i 

but use different reference distributions. The yH reference distribution is 
appropriate under conditions (C1)-(C3) of Theorem 4. Otherwise the ref- 
erence distribution is the weighted chi-squared of Theorem 5. The scaling 
constant C2 and the weights uji for the weighted chi-squared reference dis- 
tribution were estimated as indicated in Section 5. For each simulation run 
the estimated test levels were based on 1000 replications and where rele- 
vant the chi-squared and weighted chi-squared tests were performed on the 
same data. There is a substantial literature on computing tail areas of dis- 
tributions of linear combinations of chi-squared random variables. See Field 
(1993) for an introduction. For reference, the nominal standard errors of the 
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estimated levels of nominal 1, 5, 10 and 15 percent tests are about 0.31, 
0.69, 0.95 and 1.13. 

Table 1 contains results for a null regression with four independent stan- 
dard normal predictors and an independent standard normal response. The 
estimated levels of the tests seem quite far from the nominal levels for n = 25 
observations, but the agreement seems good for both tests with more than 
about n = 100 observations. 

Tables 2 and 3 contain results based on the model 

(22) y = Zi + 0.2(Zi + Z2)2 + aiV(0,l) 

with p = 4 independent standard normal predictors, and various sample 
sizes and values for a. Because dim(5iHT) = 2, we studied the behavior of 
T2 in each case. In the first part of Table 2 we held the sample size fixed 
at n = 50 and varied a from to 1.6. As a increases the estimated levels of 
both tests tend to decrease, ending with quite conservative tests at o" = 1.6. 
Note, however, that a = 1.6 is large compared to the variance of Zi, with 
var(e)/var(Zi) =2.56. At this error rate it is not surprising that the per- 
centiles for both tests differ quite a bit from their nominal value, because the 
power of Ti is not much larger than the nominal error rate. In the second 
part of Table 2 we held a fixed at 1.6 and increased the sample size. As the 
sample size increases we see the asymptotic approximations improving, end- 
ing with reasonable results at n = 400. Our general conclusion from Tables 1 
and 2 and other simulation results not reported here is that the results are 
behaving as expected, which supports our analytic calculations and method 
of implementation suggested in Section 5. Perhaps as expected, the weighted 
chi-squared seems to take a larger sample size for the asymptotics to take 

Table 1 

Estimated level of nominal 1, 5, 10 and 15 percent 
chi-squared (x^) and weighted chi-squared (y^ ) tests based 
on To for a OD regression with p = 4 independent standard 
normal predictors and an independent standard normal 
response 



Nominal level (%) 



n 


Test 


1 


5 


10 


15 


25 







2.4 


8.2 


15.3 


25 




2.4 


8.7 


15.8 


21.6 


50 




0.1 


2.4 


9.4 


15.4 


50 


t 


0.8 


6.6 


12.1 


18.2 


100 


x' 


0.8 


4.6 


9.9 


15.4 


100 


t 


1.5 


6.4 


11.8 


17.4 


200 


x' 


1.1 


4.1 


9.3 


14.5 


200 




1.3 


4.3 


10.5 


14.8 
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Table 2 

Estimated level of nominal 1, 5, 10 and 15 percent 
chi-squared (x^ ) and weighted chi-squared ) tests based 
on T2 for model (22) wtth p = 4 independent standard 
normal predictors Zj 









Nominal level. 


, n = 50 




<J 


Test 


1 


5 


10 


15 







1.1 


4.8 


9.5 


14.6 





t 


0.1 


3.3 


8.3 


13.7 


0.2 




1.0 


5.9 


10.4 


15.1 


0.2 




0.3 


3.8 


8.7 


15.4 


0.4 




1.1 


4.7 


10.0 


14.6 


0.4 







1.9 


5.7 


10.6 


0.8 


x' 


0.5 


3.5 


7.7 


11.4 


0.8 


t 





1.0 


3.7 


6.3 


1.6 


x^ 





0.7 


3.1 


5.7 


1.6 


t 





0.2 


0.7 


1.6 








Nominal level, cr = 1.6 




n 


Test 


1 


5 


10 


15 


100 


x' 


0.1 


2.9 


6.3 


9.4 


100 


t 


0.1 


0.9 


1.9 


4.1 


200 


x' 


1.3 


5.7 


10.4 


14.2 


200 


t 





1.9 


5.0 


8.6 


400 


x' 


1.4 


4.6 


9.4 


15.5 


400 


t 


0.7 


4.1 


9.5 


14.2 



hold. Additionally, there is a tendency for the weighted chi-squared test to 
be conservative. 



Table 3 

Estimated level of nominal 1, 5, 10 and 15 
percent chi-squared tests based on T2 for 
model (22) with p independent standard 
normal predictors Zj 



p 




Nominal level, 


n = 100 




1 


5 


10 


15 


4 


1.1 


4.7 


10.0 


14.2 


6 


0.7 


4.9 


9.9 


15.0 


8 


0.9 


4.0 


8.6 


15.9 


12 


0.3 


4.6 


10.2 


15.1 


16 


0.5 


3.9 


7.6 


12.0 
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In Table 3 we investigate the impact on the chi-squared test of increas- 
ing the number of unimportant predictors, holding n = 100 and a = 0.2. 
Although there seems to be a little tendency for the estimated levels to de- 
crease as p increases, overall increasing the number of predictors does not 
seem to have much of an impact. 

In Table 4 we consider model (22) with the error (tA^(0, 1) term replaced 
by 0.5(x2 ~ 2). Replacing the normal error with a chi-squared error did not 
seem to have a notable impact on the results. Because the error does not 
satisfy (C2) we have used the weighted chi-squared reference distribution. 

Finally, we present a few confirmatory results based on p = 5 standard 
normal predictors and a response generated as 

(23) Y = e0-3(2^i+322) + l.6sin(Zi - Z2) + criV(0, 1). 

Letting SD denote a population standard deviation, the signal-to-noise ra- 
tio SD(E(y|Z))/a for model (23) is about 0.4 times that for model (22), 
so the mean function of (23) should be harder to estimate. Table 5 con- 
tains estimated levels of T2 for model (23) for three sample sizes and four 

Table 4 

Estimated level of nominal 1, 5, 10 and 15 percent 

weighted chi-squared tests based on T2 for a 2D 
regression with p = 4 independent standard normal 
predictors Zj and response 
Y = Zi+ 0.2(Zi + Z2f + 0.5(xi - 2) 







Nominal level (%) 




n 


1 


5 


10 




15 


50 


0.1 


2.5 


6.5 




11.0 


100 


0.5 


3.8 


8.9 




13.1 


200 


1.0 


5.8 


9.9 




14.6 






Table 5 








Estimated level of nominal 1, 5, 10 


an 


d 15 percent 


chi-squan 


sd (x^) and weighted chi-squared (y^ ) tests based 


on 


T2 for simulation model (23) 


with a = 0.2 








Nominal level (%) 




n 


Test 


1 5 




10 


15 


50 




0.5 4.1 




9.5 


13.7 


50 


t 


2.1 




4.6 


8.7 


100 




0.6 3.3 




7.2 


12.1 


100 


t 


0.2 1.2 




4.5 


8.2 


200 


x' 


1.2 4.5 


10.5 


14.5 


200 


t 


0.4 2.4 




6.3 


11.5 
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Table 6 

Distribution of dimension estimates k out of 1000 trials based 
on the sequential chi-squared (x^) and weighted chi-squared 
(X^ ) tests with constant nominal level a 

k with n = 50 k with n = 100 

a ~6 i 2 >^ "O i 2 >3~ 



0.001 


4 


924 


72 








322 


678 





0.01 





631 


365 


4 





85 


904 


11 


0.05 





308 


656 


36 





20 


931 


49 


0.10 





187 


731 


82 





9 


901 


90 


0.15 





136 


733 


131 





3 


859 


138 



0.001 


171 


692 


137 





81 


412 


507 





0.01 


67 


527 


404 


2 


37 


149 


809 


5 


0.05 


18 


284 


666 


32 


9 


38 


911 


42 


0.10 


5 


167 


746 


82 


4 


18 


899 


79 


0.15 


2 


119 


755 


124 


1 


7 


867 


125 



The model is (22) with a — 0.4 and sample sizes 50 and 100. 



nominal levels. These results are qualitatively similar to those discussed pre- 
viously and confirm the conservative nature of the weighted chi-squared test 
in smaller samples. 

6.2. Estimation o/dim(5iHT)- In this section we present first results on 
the behavior of the sequential testing procedure discussed in Section 2.2 
for estimating fc = dim(5iHT)- We consider only estimates based on using 
the same nominal level for each of the sequential tests, although in a more 
comprehensive investigation it might be desirable to include variable levels. 

Reasoning in the context of model (22) with dim(5iHT) = 2, if the leading 
tests of A; = and k = 1 have power 1, then all of the estimation error 
arises from the level a of the test of k = 2, resulting in estimates k = 2 with 
probability 1 — a and k > 2 with probability a. Ideally, we would like to 
make a small, while maintaining high power in the leading tests. Leading 
tests with small values of a will have relatively low power and will tend to 
result in underestimation of k. We can increase the power of the leading tests 
by increasing a, but this also increases the probability of overestimation. 

For instance. Table 6 gives the empirical distribution of k out of 1000 trials 
based on the sequential chi-squared (x^) and weighted chi-squared (x^) tests 
for model (22). With n = 50 we would prefer a level around 0.1 since the 
fraction of correct decisions was observed to change little for a > 0.1 until it 
began to decrease. With n = 100, a level around 0.05 tends to balance over- 
and underestimation and produce the best results. With larger sample sizes 
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or smaller values of a, a level less than 0.05 may be preferred. Results for 
model (23) were qualitatively similar, but not quite as strong since its mean 
function is harder to estimate. 

Overall, we found no compelling reason to prefer estimates with a > 0.15. 
Tests with a = 0.05 or a = 0.1 tended to produce good results in our simu- 
lations, but tests with a < 0.05 might yield better estimates with a signifi- 
cantly larger sample size or stronger signal. 

6.3. Direction estimation. Given = 2 for models (22) and (23), we stud- 
ied the accuracy of the IHT estimates of the CMS by computing the absolute 
correlation between Zj and the fitted values from the OLS regression of Zj 
on the first two IHT predictors, j = 1,2. Shown in Table 7 are three quan- 
tiles of the empirical distribution of these correlations over 1000 simulations 
for model (22). The results for model (23) are qualitatively similar, but as 
expected the correlations are smaller at the same sample size and standard 
deviation a. For example, the quantiles for Zi under model (23) with n = 100 
and a = 0.4 were observed to be go.o5 = 0.89, go. 5 = 0.97 and go.95 = 0.996. 

7. Ozone data. In addition to simulations with normal predictors, we 
analyzed several different simulated and real data sets with nonnormal pre- 
dictors using IHT methodology and found that in nearly all cases the chi- 
squared and weighted chi-squared reference distributions result in the same 
estimate of the dimension of the CMS. The ozone data [Breiman and Fried- 
man (1985)] considered briefly in this section is an instance where the esti- 
mates of dimension differ. 

Table 7 

Quantiles (go. 05, go. 5 and go. 95 j of the empirical distribution of the 
absolute correlation between Zj and the fitted values from the OLS 
regression of Zj on the first two IHT predictors, j = 1,2, based on 
1000 replications from model (22) with three values for a and two 
sample sizes 



(T 




n = 50 






n = 100 




90.05 


qo.5 


(70.95 


qo.05 


qo.5 


(70.95 








(A) Z, 








0.2 


0.98 


0.995 


0.9996 


0.99 


0.998 


0.9998 


0.4 


0.97 


0.993 


0.9994 


0.98 


0.997 


0.9997 


0.8 


0.94 


0.99 


0.999 


0.97 


0.994 


0.9996 
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The response Y is atmospheric ozone concentration, and there are seven 
predictors: Daggett pressure gradient (DGPG, mmHg), humidity (HMDT, 
percent), visibihty (VSTY, miles), wind speed (WDSP, mph), Vandenburg 
500 milhbar height (VDHT, m), the logarithms of Sandburg Air Force Base 
temperature (SBTP, degrees C), and inversion base temperature (IETF, 
degrees F). The logarithm of the two temperature predictors was used to help 
to ensure that the linearity condition (A) hold to a useful approximation. 
Because we use only IHT methodology, there is no reason to consider the 
constant covariance condition (B). 

The test results from using the chi-squared and weighted chi-squared ref- 
erence distributions for Tj are shown in Table 8. Use of the chi-squared 
reference distribution indicates that the dimension of the CMS is 3, while 
the weighted chi-squared reference distribution indicates two dimensions. 

Shown in Figure 1 is a scatterplot of the response versus the first IHT 
predictor vjz, where vj is the eigenvector of nBB^ corresponding to its jth 
largest eigenvalue Xj . A 3D plot (not shown) of the residuals e versus the first 
two IHT predictors {vf ZjvJ Z) exhibits a saddle, confirming that dim(5iHT) 
is at least 2. We were unable to find any notable graphical support for a third 
IHT predictor and consequently we conjecture that the results of the third 
chi-squared test in Table 8 are due to a failure of condition (C2) or (C3). 
In any event, because the conditions needed for the weighted chi-squared 
reference distribution are considerably less restrictive than those needed for 
the chi-squared, the weighted chi-squared p-values are likely more reliable. 

8. Discussion. In this article we developed two asymptotic tests for the 
dimension k of the IHT subspace 5iht- The tests use the same statistic 

~ Z^iLj+i foi' the hypothesis rank(i?) = j, but have different refer- 
ence distributions depending on characteristics of the regression. The Xp~j 
reference distribution is appropriate under conditions (C1)-(C3) of Theo- 
rem 4. Otherwise, in practically full generality, the reference distribution 
is the weighted chi-squared of Theorem 5. We typically use both reference 
distributions in practice, as illustrated in Table 8. 



Table 8 
Test results for the ozone data 
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Fig. 1. Scatterplot of ozone versus the first IHT predictor with a LOWESS smooth. 



Both tests are derived under the working coverage condition (CI), which 
is typically assumed in similar asymptotic developments in the literature. 
The working assumption is supported partly by the fact that IHT incor- 
porates OLS and PHD in a way that does not evoke the constant variance 
condition (B), and is capable of discovering monotone and nonmonotone 
trends. Our experience indicates that IHT often works very well in picking 
up the patterns in a regression relation, so long as there is a nonzero vec- 
tor to initiate the iteration process. Even if the coverage assumption does 
not hold, the general test still finds the significant vectors in the CMS, but 
the span of these vectors need not cover the CMS. In such cases, the tests 
should be viewed as a means of finding significant vectors in iSiht, which is a 
subspace of the CMS under the linear conditional mean condition (A). The 
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issue of coverage is a fundamental and challenging one and deserves care- 
ful and in-depth investigation for IHT as well as other dimension reduction 
methods. 

Cook and Critchley (2000) found that the CS automatically expands to 
incorporate regression outliers and mixtures. Consequently, they argued that 
the acknowledged sensitivity of CS methods like SIR and SAVE [see, e.g.. 
Gather, Hilker and Becker (2002)] can be viewed as an advantage, since 
they have the ability to identify outliers and mixtures along with the main 
regression. In effect, methods for estimating the CS provide their own di- 
agnostics. We conjecture that IHT is similarly self-diagnosing for outliers 
that affect the regression mean. Although we have not performed theoret- 
ical work to trace the diagnostic limits of IHT, various simulation results 
suggest that they might be fairly wide. For example, with four standard nor- 
mal predictors, we generated 50 observations according to the linear model 
Y = Zi + 0.2N{0, 1), and then added a 51st observation with Y = 6 and cor- 
responding Zj = 2, j = 1, . . . ,4. IHT estimated the dimension of the CMS 
to be 2, and the 3D summary plot clearly showed the linear mean structure 
and the outlier. Removal of the outlier resulted in a one-dimensional esti- 
mate of the CMS, as expected. Alternatively, we might deal with outliers by 
designing a robust version of IHT, replacing the sample moments by more 
robust estimators along the lines that Gather, Hilker and Becker (2001) used 
to investigate a robust version of SIR. This, however, is beyond the scope of 
the present paper. 

The availability of these tests means that IHT is now a fully functioning 
methodology on a par with PHD. But, unlike PHD, it does not require the 
constant covariance condition (B) for either estimation or testing. In situ- 
ations where PHD is applicable [conditions (C1)-(C3)], IHT automatically 
combines PHD with OLS, taking advantage of the ability of OLS to find 
linear trends in the mean function, and the ability of PHD to find nonlinear 
trends. 



Throughout this section the identity matrix of dimension p will be written 
as I rather than Ip. 

Proof of Lemma 1. By definition. 



APPENDIX: PROOFS OF LEMMAS 



Let us first expand 



(24) 



(3 = I]-^/^a-^En{Z -Z){Y-Y). 
Note that 



i: = En{zz^)- zz"^ 

= En{ZZ^) + Op(n-^) = / + EniZZ^ - I) + Op(n-i), 
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where En{ZZ'^ — I) is of the order Op{n~^^'^). We know that S"^/^ must 
be of the form / + An for some random matrix An of the order Op(n~^/^). 
Therefore, 

{I + Anf{I + En{ZZ'^-I)) = I. 

The left-hand side is 

/ + En{ZZ^ + 2An + Op(n-i). 
Therefore An = -EniZZ'^ - -f)/2 + Op{n~^) and 

(25) £-1/2 = / _ EniZZ"^ - I)/2 + Op(n-i). 
By a similar argument one can show that 

(26) a-^ = l-En{Y^-l)/2 + 0p{n-^). 
It is easy to see that 

EniZ-Z){Y-Y)=En{ZY) + Op{n--^) 

(27) 

= P + En{ZY)-(] + Op{n"^). 

Now substitute (25), (26) and (27) into (24) and expand the right-hand 
side of (24) to obtain expansion (4). 

Next let us prove expansion (5). By definition, 

(28) H = S-i/2^„[e(Z -Z){Z- Zf]^'^/^. 

We have already expanded S-^/^. Now let us expand En[e{Z — Z){Z — Z)'^]. 
We have 

En[eiZ - Z){Z -Zf] = EnieZZ^) - ZEnieZ^) - En{eZ)Z^ + Op{n-^). 

Because Z = Op{n~^^^), we need only expand En{eZ) so that the error is of 
the order Op{n~^/'^). Note that 

EnieZ) = En[{a-\Y -Y) - P^^-'/\Z -Z))Z] 

= a-'En[(Y -Y)Z] - En[ZiZ-Zf]^~^/'p. 

It is easy to see that 







= \^Op{n 


-1/2. 


En[{Y- 


-Y)Z] 


= P + Op{n- 


-1/2 


En[Z{Z 


-Zf\ 


= I + Op{n 


-1/2- 






= I + Op{n 


-1/2- 




/3 


= (3 + Op{n 


-1/2 
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Therefore, 

and consequently, 

(29) Ene{Z -^){Z -Zf = EniZZ^ + Opin'^). 

We now expand the right-hand side so that the error is of the order Op{n~^). 
We have 

(30) EniZZ^ = a''En[{Y -Y)ZZ^] - En[F^~^'\Z - 'Z){ZZ^)\. 
The first term on the right-hand side is 

a-^EnW -Y)ZZ^\ 

= a-^Er\{Y -Y){ZZ^ 

(31) 

= <T-iK[y(zz^ - /)] + Op(n-i) 
= (1 - S„(y2 _ \)l2)En'^{ZZ^ - /)] + Op(n-i). 
The second term on the right-hand side of (30) is expanded as 

i5;„[/3^s-i/2(z-z)(zz'^)] 

= i?„[/3^S-i/2(Z-Z)(ZZ^-/)] 
= Er\h^Yr^l^Z{ZZ'^ - /)] + Op(n-i). 
The (z, j)th element of the p x p matrix on the right-hand side is 

V 

Y^{Y.-^'^^)kEn[Zk{ZiZ, - 6ij)l 

k=l 

where (S~^/^/3)fc is the kth element of the vector and 6ij is the (i, j)th 

element of the p-dimensional identity matrix /. Because Z has a standard 
multivariate normal distribution, the expectation of Zk{ZiZj — 5ij) is zero 
for any i,j,k. Therefore En{Zk{ZiZj — 5ij)) = Op{n~^^^), and hence if we 
replace the S and /? by / and /3, then the error incurred has the magnitude 
Op{n~^). It follows then that 

(32) ^„[/3^S-V2(z -Z){ZZ^)] = En[fZ[ZZ'^ - I)] + Op(n-i). 
Now substitute (31) and (32) into (30) to obtain 

En{eZZ^) = En[e{ZZ'^ - I)] - \En{Y^ - l)En[Y{ZZ^ - I)] + Op{n-^). 
However, note that 

^[e(ZZ'^ - I)] = E{eZZ'^) = H. 
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Hence 



En{eZZ^) = H + En[e{ZZ^ - j) - H] - \En{Y^ - 1)H + Op(n-i), 
which, combined with (29), imphes that 
Ene{Z-Z)iZ-Zf 

(33) 

= H + En[e{ZZ^ -I)-H]- ^En{Y^ - 1)H + Op{n~^). 

Now substitute (25) and (33) into (28), and expand the right-hand side of 
(28) to obtain the desired expansion (5). □ 

Proof of Lemma 2. (i) Since j3 ^ Span{H) and HP, HP~^/3 belong 
to Span(//), /? ^ Span{H(3, . . . , H^~^f3}. Meanwhile, we know that 



If the first row of ^'o is not 0, then (3 can be written as a linear combination 
of H(3, . . . , H^~^/3, which is a contradiction. 

(ii) First, consider the case /? _L Span(ff) (which includes the case H = 0). 
Then B = (/3, 0, . . . , 0), rank(i?) = 1, and ^'o is a p by p — 1 matrix. Write 



where $o is a p — 1 by p — 1 matrix. Since $o is an orthonormal matrix, its 
first row must contain a nonzero element. 

Next, consider the case where /? is not orthogonal to Span(ff). In this 
case rank(i?) > 2. Suppose first that rank(i?) = 2. Then ^'q is a p by p — 2 
matrix. We claim that H^P / 0. This is because if H^P = H{HP) = 0, then 
Hp _L Span(ff), but this implies HP = since HP belongs to Span(ff). This 
means that /3_L Span(i?), which is a contradiction. Hence 



where Aq is a p - 2 by p - 2 matrix. Then (i?^/?, . . . ,iJP~V)Ao = 0. In 
other words, the columns of Aq are orthogonal to the rows of the matrix 
{H'^P, . . . , HP~^P), which contains at least one nonzero row. Consequently 
the p — 2 columns of Aq belong to a (p — 3) -dimensional space, so that they 
cannot be an orthogonal set. But this contradicts the fact that the columns 
of ^'o are orthogonal. 



{P,HP,...,HP- 



/3)^'o = 0. 




{H^P,...,HP-^ 



P)^0. 



Now suppose that the first row of <I>o is and write 
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Next, suppose that rank(B) = k>2.We first prove that Hf3 G Span{H'^f3, HP-^(5). 
Prom P ^ Span(f/') it follows that the vectors H^P, . . . , HP~^P belong to the 
subspace spanned by the vectors H/d, . . . , H^~^P, because otherwise we have, 
for some j £ {k, . . . ,p — 1} and some ci 7^ 0, 

= ciP + C2HP +■■■ + CkH^-^p, 

contradicting the assumption /3 ^ Span(i7). By the same argument we can 
deduce that the vectors P, . . . , H^^^P must belong to the subspace spanned 
by H'^P, . . . , H^~^P. In particular, 

h^P = {h'^P,...,h^-^P)5 

for some 5 in R^~'^. Then 

(34) H{H^~^P - {HP, H^~^P)6) = 0. 

In other words, the vector H^~^P — {HP, . . . , H^~'^P)6 is orthogonal to the 
rows, and hence columns, of H . However, both vectors in this difference 
belong to Span(i7), and so we have 

H^~^p={Hp,...,H^~^p)6. 

Consequently H'^'^P, and hence all the subsequent vectors H'', . . . , H^'^P, 
belong to the space spanned by HP, . . . , H'^~'^P, which contradicts the as- 
sumption that rank(i?) = k. 

However, HHp belongs to Span{H^P, H^-^p), then the matrix {H'^P, H^-^p) 
has rank at least k — 1, because we know that HP, H^P, . . . , H^~^P are lin- 
early independent. Hence the solution space of the matrix {H'^P, ■ ■ ■ , HP~^P)x = 
has dimension at most {p — 2) — {k — l) = p — k — 1. Now if the first two rows 
of ^'o are zero, then there are p — k orthogonal solutions to that equation, 
which is impossible. □ 

Proof of Lemma 3. Since P belongs to Span(i7) it can be written as 
P = Hr] for some rj in M^. Pirst assume that rank(i3) = 1. We claim that 
H'^rj^O, otherwise Hi] is orthogonal to Span(//), and must therefore be 
because Hr] belongs to Span(//). If the first row of ^'o is 0, then 

{H^7],...,HPr])^o = 0. 

Therefore thep — 1 columns of <I>o are orthogonal to the rows of {H'^rj, . . . , H'P~^r]), 
which contains a nonzero row. But if so, the columns of $0 belong to a 
{p — 2)-dimensional subspace of M^"^, and cannot be an orthogonal set — a 
contradiction. 

Now suppose that rank(i?) = k>2. We first prove that P G Span{HP, . . . , W^~^P). 
Otherwise, by an argument similar to that used in Lemma 2, the vectors 
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H^I3, HP-^(3 all belong to the space spanned by HP, . . . , H'^ In par- 
ticular, for some 6 G M'^"^, 

H^P = {HP,...,H^-'P)5, 

which implies 

H''+\ = {H'^ri,...,H''r])5. 

But then, as we argued in the proof of Lemma 2, following display (34), 

H^ri = {Hr],...,H^~\)5 or H^~^ [3 = {(5, . . . ,H^~'^ 13)5. 

This implies that H^~^(3, and hence all its subsequent vectors H^P, . . . , HP~^P, 
belong to the space spanned by /3, . . . , H^~'^P, contradicting the assumption 
that rank(i?) = k. 

That P belongs to Span{HP, . . . , BP~^P) implies that the matrix {HP, . . . , H^ 
has rank k, because /?,..., H^~^P are linearly independent. Therefore the 
equation {HP, . . . , HP~^P)x = has at most {p — 1) — k=p — k — 1 lin- 
early independent solutions. However, if the first row of is zero, then 
{Hp, . . . , H'P~^P)x = has p — k orthogonal solutions — a contradiction. □ 
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